Bootstrapping Language Description: the case of Mpiemo (Bantu A, Central African Republic)

نویسندگان

  • Harald Hammarström
  • Christina Thornell
  • Malin Petzell
  • Torbjörn Westerlund
چکیده

Linguists have long been producing grammatical decriptions of yet undescribed languages. This is a time-consuming process, which has already adapted to improved technology for recording and storage. We present here a novel application of NLP techniques to bootstrap analysis of collected data and speed-up manual selection work. To be more precise, we argue that unsupervised induction of morphology and part-of-speech analysis from raw text data is mature enough to produce useful results. Experiments with Latent Semantic Analysis were less fruitful. We exemplify this on Mpiemo, a so-far essentially undescribed Bantu language of the Central African Republic, for which raw text data was available.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Morphological function, syllabic and phonetic form of nasal+plosive combinations in the Bantu language Mpiemo

A discussion on how to handle consonant combinations in the Bantu language Mpiemo, spoken in the the south west border region of the Central African Republic is presented. The question is raised, whether nasal+consonant combinations are adequately analysed as single phonological units or as separate ones. Phonetic, syllabic and morphological aspects are taken into consideration.

متن کامل

Acoustic properties of implosives in Bantu Mpiemo

Previous studies on implosives have shown a great diversity in the production of implosives among the languages in the world. In the light of this, this paper seeks to identify the acoustic phonetic properties of a Bantu language, Mpiemo, spoken in the Central African Republic. One of the strong acoustic correlates of implosives is increasing voicing amplitude during occlusion, which contrasts ...

متن کامل

Lexical Semantics and Selection of TAM in Bantu Languages: A Case of Semantic Classification of Kiswahili Verbs

The existing literature on Bantu verbal semantics demonstrated that inherent semantic content of verbs pairs directly with the selection of tense, aspect and modality formatives in Bantu languages like Chasu, Lucazi, Lusamia, and Shiyeyi. Thus, the gist of this paper is the articulation of semantic classification of verbs in Kiswahili based on the selection of TAM types. This is because the sem...

متن کامل

Mitochondrial, Y-chromosomal and autosomal variation in Mbenzele Pygmies from the Central African Republic.

In this paper, we carry out a combined analysis of autosomal (ten microsatellites and an Alu insertion), mitochondrial (HVR-1 sequence, 360 nucleotides) and Y-chromosomal (seven microsatellites) variation in the Mbenzele Pygmies from the Central African Republic. This study focuses on two important questions concerning the admixture and origin of African Pygmies. Ethnographic observations sugge...

متن کامل

Molecular epidemiology of human polyomavirus JC in the Biaka Pygmies and Bantu of Central Africa.

Polyomavirus JC (JCV) is ubiquitous in humans and causes a chronic demyelinating disease of the central nervous system, progressive multifocal leukoencephalopathy which is common in AIDS. JCV is excreted in urine of 30-70% of adults worldwide. Based on sequence analysis of JCV complete genomes or fragments thereof, JCV can be classified into geographically derived genotypes. Types 1 and 2 are o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008